1,323 research outputs found

    Hierarchic Bayesian models for kernel learning

    Get PDF
    The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method

    Variational Bayesian multinomial probit regression with Gaussian process priors

    Get PDF
    It is well known in the statistics literature that augmenting binary and polychotomous response models with Gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation strategy, dispensing with priors over regression coefficients in favour of Gaussian Process (GP) priors over functions, and employing variational approximations to the full posterior we obtain efficient computational methods for Gaussian Process classification in the multi-class setting. The model augmentation with additional latent variables ensures full a posteriori class coupling whilst retaining the simple a priori independent GP covariance structure from which sparse approximations, such as multi-class Informative Vector Machines (IVM), emerge in a very natural and straightforward manner. This is the first time that a fully Variational Bayesian treatment for multi-class GP classification has been developed without having to resort to additional explicit approximations to the non-Gaussian likelihood term. Empirical comparisons with exact analysis via MCMC and Laplace approximations illustrate the utility of the variational approximation as a computationally economic alternative to full MCMC and it is shown to be more accurate than the Laplace approximation

    Semi-parametric analysis of multi-rater data

    Get PDF
    Datasets that are subjectively labeled by a number of experts are becoming more common in tasks such as biological text annotation where class definitions are necessarily somewhat subjective. Standard classification and regression models are not suited to multiple labels and typically a pre-processing step (normally assigning the majority class) is performed. We propose Bayesian models for classification and ordinal regression that naturally incorporate multiple expert opinions in defining predictive distributions. The models make use of Gaussian process priors, resulting in great flexibility and particular suitability to text based problems where the number of covariates can be far greater than the number of data instances. We show that using all labels rather than just the majority improves performance on a recent biological dataset

    The latent process decomposition of cDNA microarray data sets

    Get PDF
    We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called latent process decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in contrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast

    Probabilistic assignment of formulas to mass peaks in metabolomics experiments

    Get PDF
    <b>Motivation</b>: High-accuracy mass spectrometry is a popular technology for high-throughput measurements of cellular metabolites (metabolomics). One of the major challenges is the correct identification of the observed mass peaks, including the assignment of their empirical formula, based on the measured mass.<p></p> <b>Results</b>: We propose a novel probabilistic method for the assignment of empirical formulas to mass peaks in high-throughput metabolomics mass spectrometry measurements. The method incorporates information about possible biochemical transformations between the empirical formulas to assign higher probability to formulas that could be created from other metabolites in the sample. In a series of experiments, we show that the method performs well and provides greater insight than assignments based on mass alone. In addition, we extend the model to incorporate isotope information to achieve even more reliable formula identification.<p></p&gt

    Acquisition efficiency of Flavescence dorée phytoplasma by Scaphoideus titanus Ball from infected tolerant or susceptible grapevine cultivars or experimental host plants

    Get PDF
    The rate of Flavescence dorée phytoplasma (FDP) acquisition by the leafhopper vector Scaphoideus titanus Ball was tested under field and glass house conditions confining healthy reared nymphs on canes of FDP-infected grapevines or on FDP-infected cuttings collected in the field during the dormant season. Acquisition tests were performed using FD-tolerant (Merlot) or highly susceptible (Pinot blanc) grapevine cultivars, or alternatively using experimentally infected broadbean plants. Frequency of FDP acquisition by leafhoppers was evaluated using a polymerase chain reaction (PCR) assay. Different batches of insects were confined on the same infected source plants in the vineyard for acquisition access periods (AAP) of 7 d at a time at intervals of 15-20 d during spring and summer. When diseased Pinot blanc grapevines were used as source plants, acquisition by leafhoppers and transmission to healthy grapevines increased over summer, while almost no acquisition or transmission was observed when diseased Merlot grapevines were used as source plants. Tests conducted under controlled conditions confirmed that Merlot is a poorer source of FDP than Pinot blanc; the optimum FDP source for S. titanus was broadbean although this plant is not a natural host of the leafhopper. It is assumed that grapevine cultivars play an important role in influencing the proportion of FDP-infected leafhoppers in the vineyards and therefore influencing the rate of disease progress.
    • …
    corecore